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METHODOLOGY AND CONVERGENCE RATES FOR 
FUNCTIONAL LINEAR REGRESSION 

By Peter Hall^ and Joel L. Horowitz^ 

Australian National University and Northwestern University 

In functional linear regression, the slope "parameter" is a func- 
tion. Therefore, in a nonparametric context, it is determined by an 
infinite number of unknowns. Its estimation involves solving an ill- 
posed problem and has points of contact with a range of methodolo- 
gies, including statistical smoothing and deconvolution. The standard 
approach to estimating the slope function is based explicitly on func- 
tional principal components analysis and, consequently, on spectral 
decomposition in terms of eigenvalues and eigenfunctions. We dis- 
cuss this approach in detail and show that in certain circumstances, 
optimal convergence rates are achieved by the PCA technique. An 
alternative approach based on quadratic regularisation is suggested 
and shown to have advantages from some points of view. 

1. Introduction. In functional linear regression, data pairs (Xi,Yi) are 
generated by the model 



(1.1) Yi = a + J^bXi+Ei, l<i<n. 



The Xj's are random functions, I denotes the interval on which each such 
function is defined, the intercept a and the errors Ej are scalars and the 
slope b, the main object of our interest in this paper, is a function. The 
model (1.1) is applicable in a wide range of settings, including many where 
data are becoming available only through new developments in technology. 

For example, in near-infrared spectroscopy applied to data on different 
cereal-grain types (e.g., different varieties of wheat), Xi{t) denotes the in- 
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tensity of reflected radiation recorded at the spectrometer when the wave- 
length equals t and Yi denotes the level of a particular protein for the ith 
cereal type. By constructing the linear regression at (1.1), we can predict, 
from data on a new function X, the level of protein for that cereal type. 
This is especially useful in practice, since the explanatory variables Xi are 
very easy and inexpensive to observe in the field using hand-held equipment, 
whereas direct calculation of the Yi requires expensive and time-consuming 
analysis in a laboratory. There is an extensive literature on this problem; 
see, for example, [26, 31]. 

Once an estimator b of the slope b is available, it is straightforward to esti- 
mate the intercept a, for example, as the average of the values of 1^ — /j bXi. 
Therefore, much interest in the literature focuses on estimating b. The con- 
ventional approach, discussed, for example, by Ramsay and Silverman ([24], 
Chapter 10 and [25]), is based on principal components analysis or PCA. 
Although this method has been widely discussed (e.g., [3, 7, 14]), relatively 
little is known about convergence rates of estimators, apart from upper 
bounds. In this paper, we shall give optimal convergence rates in this prob- 
lem and discuss PCA-based estimators which attain those rates. The known 
upper bounds for convergence rates are an order of magnitude greater than 
the minimax-optimal rates derived in this paper. 

An alternative approach based on Tikhonov, or quadratic, regulariza- 
tion [29] will also be addressed. To the best of our knowledge, this approach 
has not been considered before in functional data analysis, although it has 
been widely applied to the solution of other ill-posed problems. In particular, 
quadratic regularisation methods are increasingly studied in the statistics 
literature; see, for example, work of Efromovich and Koltchinskii [11] and 
Cavalier et al. [5] on optimality properties. 

We shall show that the Tikhonov regularisation approach is also able to 
achieve optimal convergence rates and that it is robust against potential 
problems caused by tied, or closely spaced, eigenvalues in the spectral de- 
composition on which PCA is based. The difficulties that close eigenvalues 
can cause for PCA will be discussed using an example. 

The estimation of slope and intercept parameters in functional linear 
regression has points in common with a range of smoothing and decon- 
volution problems where dimension reduction is involved; see, for exam- 
ple, [9, 12, 13, 28]. Work on statistical smoothing is particularly extensive 
and relatively well known to readers, so we shall not attempt to survey it 
here. The problem of estimating the slope in functional linear regression is 
also related to that of estimating the point-spread function in image analysis 
when the true image, or test pattern, is known. Here, too, significant work 
has been done; see, for example, [18, 32]. 

Of course, the literature on linear inverse problems is very much larger 
than this. In the statistics setting, it includes the work of Donoho [10] 
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and Johnstone [17], who used wavelet and vaguelette methods, and that 
of van Rooij and Ruymgaart [30] on optimal convergence rates. There is 
also closely related work in economics on the subject of panel data [16], 
covariate measurement error [19] and estimation with instrumental vari- 
ables (e.g., [2, 8, 15, 22, 23]. In statistics, there is related work on errors-in- 
variables problems (e.g., [4]). There is a small, but increasing, literature on 
applications of functional regression to longitudinal data analysis; see, for 
example, [6, 27]. 

2. Methodology. We shall assume that we observe independent and iden- 
tically distributed data {Xi,Yi), . . . , (Xn,Yn), where each explanatory vari- 
able Xi is a square integrable random function on the compact interval X. 
The response variables Yi are generated by the model (1.1). It will be sup- 
posed that the errors £i are independent and identically distributed with 
finite variance and zero mean and that the errors are also independent of 
the explanatory variables. Our goal is to discuss estimators of b and to de- 
scribe the rate at which they converge to the true function. 

We begin by describing standard functional linear regression methodology, 
as discussed by, for example, Ramsay and Silverman ([24], Chapter 10). It is 
founded on spectral expansions of both the covariance of X and its estimator 
and is constructed as follows. 

Let {X, y, e) denote a generic {Xi,Yi,ei) and put K{u, v) = cov{X{u),X{v)}, 
X = J2i arid 

— 1 " 

K{u,v) = - J2{X,{u) - X{u)}{Xi{v) - X{v)}. 
Write the spectral expansions of K and K as 

oo oo 

(2.1) K{u,v) =^Kj(j)j{u)(j)j{v), K{u,v)=^kj(j)j{u)(j)j{v), 

j=i j=i 

where 

(2.2) Kl > K2 > • • • > 0, Ki > /t2 > • • • > 

are the eigenvalue sequences of linear operators with kernels K and K ^ 
respectively, and (j)i,(j)2, ■ ■ ■ and (j)i,(l)2, ■ ■ ■ are the respective orthonormal 
eigenvector (in fact, eigenfunction) sequences. We interpret {kj,(f)j) as an 
estimator of {Kj,(l)j). 

During the review process, it was suggested that the case where J2j i^j 
diverges might be explored. For example, the context Kj ~ with a close 
to either or ^, might provide particular challenges. We agree that this set- 
ting is of mathematical interest. However, it should be noted that if varX{t) 
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is bounded in t, then < The case of unbounded covariance does 

not commonly arise in appUed work. 

Both sequences {</>j} and {(pj} are complete in the class of square inte- 
grable functions on I. The fact that each kj is strictly positive implies that 
the linear operator corresponding to K, which takes a function (j) to K(j) 
and is defined by {K(j)){u) = J K{u,v)(j){v) dv, is strictly positive definite. 
(To simplify notation, we use the symbol K for both the kernel and the 
operator.) We determine the signs of (j)j and (j)j, in cases where signs are 
important, by insisting that Jj(pj4>j > 0. This can be done without loss of 
generality, for example, by changing the sign of (pj to match that of (pj, since 
switching the signs of <j)j and results in commensurate changes of sign 
for generalized Fourier coefficients such as the quantities bj and gj which 
we shall introduce below. Therefore, Jj(pj4>j > can be assumed without 
altering the values taken by estimators. 

A model equivalent to (1.1) is 

Yi- fi = J^b{Xi-x) + ei, l<i<n, 

where x = E{Xi) and /i = E{Yi) = o + / bx, with x denoting a deterministic 
function on X. It follows that if we define g{u) = E[{Y — fj,){X{u) — x{u)}], 
where {X,Y) represents a generic pair (Xi,Yi), then 

Kb = g. 

Moreover, if we write b = J2j ^j'Pj and g = J2j 9j4'ji then bj = Kj^gj. This 
suggests the estimator 

m 

(2.3) 6(n) = ^6,-0j(n), 

i=i 

where the truncation point m is a smoothing parameter, bj = kj^gj, gj = 
l9(i>j, 

1 " - 

(2.4) g{u) = - Y^iY, - y){X,(n) - X{u)} 

^ i=i 

and y = 71-1 ^^Fi. 

Next, we suggest an alternative method which uses a ridge parameter p 
rather than the cutoff m as the smoothing parameter. Let = [K + pl)~^ 
denote the inverse of the operator K + pi, where p > and / is the identity 
operator. Define 

~ ^ 1 " - ^ 

(2.5) b = K+g = - Y.{Y, - Y)K+{Xi{u) - X{u)}, 

1=1 

where g is as in (2.4). Then b is an estimator alternative to b. 
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3. Theoretical properties. First, we treat the standard functional linear 
regression estimator 6, defined in (2.3). The Karhunen-Loeve expansion of 
the random function X is given by 



X-E{X) = Y^i^c 



where the random variables = Jx{X — EX)(f)j have zero means and vari- 
ances = Kj and are uncorrelated. Let C > 1 denote a constant. Con- 
cerning the distributions of the random function X and the errors e in the 
model at (1.1), we shall assume that 

X has finite fourth moment, in that JjE{X^) < oo; E{^j) < Ckj for all j, 
(3.fe)id the errors Ej are identically distributed with zero mean and variance 
not exceeding C. 

Of the eigenvalues kj, we require that 

(3.2) Kj - Kj+i > c-^j-^'^ for j > 1. 

This condition prevents the spacings between adjacent order statistics from 
being too small. It also implies a lower bound on the rate at which Hj de- 
creases: Kj must not be less than a constant multiple of The importance 
of (3.2) in ensuring Theorem 1, below, will be discussed following Theorem 2. 
Of the Fourier coefficients bj and exponents a and /3, we suppose that 

f3 3) \bj\<Cj-^ 

a>l, ia + l</3. 

The first part of (3.3) can be viewed in at least two ways: as a definition 
of /3, in terms of a given sequence bj, or as a condition that the generalized 
Fourier coefficients bj do not decrease too quickly. The basis with respect 
to which these coefficients are defined is determined by the context of the 
problem and, more particularly, by the covariance function K, rather than 
outside the problem. This is not unnatural, for at least two related reasons. 
First, the basis (j)i,(p2,--- is canonical in the functional-data problem since 
it is the unique basis with respect to which the function X can be expressed 
as a generalized Fourier series (its Karhunen-Loeve expansion) with uncor- 
related coefficients. It gives the most rapidly convergent representation of X 
when speed of convergence is defined in an L2 sense. Second, as discussed 
in Section 1, the representation with respect to this basis is fundamental 
to the most popular method for estimating b and is therefore particularly 
deserving of study. 

Note that the assumption that K is bounded, or even the milder con- 
dition Jj-vav{X{u)} du < 00, entails J2j < co- Further, note that (3.2) 
implies Kj > Cj~°' for some constant C > 0. Therefore, boundedness of K 
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and (3.2) together imply that a > 1, which is the second part of (3.3). The 
assumption + 1 < /? in (3.3) requires that the function b be sufficiently 
smooth relative to K, where smoothness of K is expressed relative to the 
spectral decomposition of this function. (More concisely, b should be suffi- 
ciently smooth relative to the lower bound on the smoothness of K that is 
implied by the condition kj > Cj~".) Since a > 1, a sufficient condition for 
^a-|-l</3isa</3, which can be interpreted as requiring that the function b 
be no less smooth than the lower bound on the smoothness of K implied 
by (3.2). 

Of the tuning parameter m, we assume that 

(3.4) ?nxni/('^+2/3)^ 

In (3.4), the relation r„ x for positive r„ and s^, means that the ratio 
^n/sn is bounded away from zero and infinity. 

Let !F{C, a, /3) denote the set of distributions F of {X, Y) that satisfy (3.1)- 
(3.3) for given values of C, a and (3. Let B denote the class of measur- 
able functions b of the data (Xi, Yi ),..., (X„, y„) generated by (1.1). We 
shall frame our next result in terms of minimax bounds. Below, the upper 
bound (3.5) shows performance of b and the lower bound (3.6) reflects per- 
formance of any estimator of b. The fact that the convergence rate is the 
same in each instance implies that the rate for b is optimal in a minimax 
sense. 

Theorem 1. // (3.1)-(3.4) hold, then 

(3.5) lim limsup sup Ppl f (b- bf > z)„-(2/3-i)/(«+2/3) 1 ^ g 

as n — > oo . Furthermore, 

(3.6) liminf n(2'^-^)/("+2^) inf sup / Epib - bf > 0. 
It follows from (3.5) that for each F ^ 

jr(6_5)2 = 0^(^-{2/3-l)/(a+2/3))_ 

The theorem is proved in Section 5. The fact that (3.5) is expressed in 
terms of a probability rather than an expected value is not significant. By 
modifying the estimator b using a truncation point, to prevent b taking values 
that are too large, we may state and prove (3.5) in the more traditional form; 
compare (3.10) below. We do not do this, since the present form of b is the 
one actually used by statisticians. 

Convergence rates of the form n"^^'^"^)/^""'"^^) are generic to a large class 
of noisy inverse problems where the difficulty of inverting the operator is 
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an increasing function of a and the smoothness of the target function is 
an increasing function of /3. For example, this rate arises in the context of 
problems discussed by Cavalier et al. [5]. See equation (7) there and note 
that the appropriate values of the components of that formula are Aj = 1 
for 1 < Aj < m and Aj = otherwise, Oi = hi, af = var(^j) and = n~^. Of 
course, Theorem 1 cannot be derived from the results of Cavalier et al. [5], 
but, since the problem is of the same broad type, the rates enjoy the same 
form and have exactly the same formula if we make the substitutions above. 
Connections of this nature are frequently highlighted in the literature, for 
nonlinear inverse problems (see, e.g., [20, 21]) as well as linear ones. In 
particular, similar remarks can be made about the rates given by Hall and 
Horowitz [15]. 

Next, we address the alternative estimator b in (2.5), where the smoothing 
parameter is the ridge p, rather than the cutoff m. Assumptions (3.2)-(3.4) 
are replaced by 

(3.7) r°<CK,-, 

(3.8) \bj\<Cj-^, a>l, a-l<(3, 

(3.9) pxn-°/("+2/3), 

respectively. Let Q{C,a,P) denote the set of distributions F of {X,Y) that 
satisfy (3.1), (3.7) and (3.8) for given values of C, a and /3. 

The result below is a direct analogue of Theorem 1 in the case of b rather 
than 6, except that we replace the probability bound (3.5) by one on expected 
value. 

Theorem 2. // (3.1) and (3.7)-(3.9) hold, then 

(3.10) sup / Epib - bf = 0(n-(2/3-i)/{-+2/3)) 

as n — > oo . Furthermore, 

(3.11) liminf n(2'^-^)/("+2^) inf sup / Epib - bf > 0. 

A proof of (3.10) can be developed along the lines of that of Theorem 4.1 
of Hall and Horowitz [15] and so will not be given here; a proof of (3.11) is 
identical to that of (3.6). There is no close connection between the conver- 
gence rates in (3.10) and those in [15]. In fact, the only significant linkage 
is that both rates are obtained by using Tikhonov regularisation to solve a 
linear inverse problem. From a conventional statistical viewpoint, our work 
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is much closer to that of hnear regression in a large number of dimensions 
than it is to instrumental variables problems. 

Condition (3.7) is weaker than (3.2). For example, the latter excludes cases 
where two or more of the eigenvalues close together, in particular, 

where they are tied. [When employing the approach (2.5), it is not necessary 
to assume strict inequality among the kj 's.] Indeed, if closely spaced eigen- 
values are permitted, (3.5) in Theorem 1 can fail while (3.10) in Theorem 2 
holds. This is perhaps best illustrated by an example, which we give below, 
in a setting where there are long strings of tied eigenvalues. The assumption 
of perfect ties can be relaxed by permitting the Kj's to be very close to one 
another, but not identical. The argument there is more complex, however. 

Let 7,r denote constants satisfying 1 < 7 < ar and let jk equal the least 
integer not less than k'''^ . Put = {jk^jk + 1, . . . ,jk+i ~ 1} ^-nd define kj = 
k~^'y for all j Jk- Then for j in this range, 

(3.12) Kj = k"''^ > k^'"'^ > j^" > 

and also, jk+i/jk ~ e'^/c'^ as k increases. Property (3.12) implies (3.7), but 
(3.2) fails because of the ties. 

Those ties mean that the functions , for j in the block J'k , are not even 
identifiable. Indeed, any permutation of the function sequence (pj, j £ J^k, is 
equally appropriate, since within-block permutations of the (pj^s do not lead 
to violations of the condition that the K/j S 3jrG nondecreasing. For the same 
reason, while the (unordered) set of function estimators, ^k = {<Pj - j ^ J^k}, 
converges to the set = {cpj :j £ J'k} as n — > 00, for each k, the individual 
estimators (pj are not consistent for the respective functions (pj. 

If the sum in (2.3) is taken over a whole number of blocks jTfc, this incon- 
sistency does not cause any difficulties in estimating the slope function b. 
There are problems, however, if the integer m in (2.3) falls midway through 
one of the blocks J'k- For definiteness, take m to equal the integer part of 
^i/(a+2/3)^ thereby satisfying (3.4). Define ko = ko{n) to be the unique value 
of k such that m £ J'k- Then along an infinite sequence, M say, of values 
of n, we have 

(3.13) ^{jko + jfco+i) ^ < Jfeo+i - 1- 

Condition (3.13) ensures that the set of integers j G jT^o that lie between jkg 
and m comprises at least half of ^^o- Moreover, since jk+i/jk ^ e'^k'^ , then 
for all sufficiently large n € A^, the value of 

— #{i : j e Jko such that l<j<m} 
m 

converges to 1 as ?i ^ 00. We shall call these properties (P). 



FUNCTIONAL LINEAR REGRESSION 



9 



An argument based on symmetry shows that if p = p is the random per- 
mutation of J^ko defined to minimize any given symmetric measure of per- 
formance of $ as an estimator of ^j, for example, to minimize 

then j5 is uniformly distributed on the set of all permutations of JT^q . From 
this, it may be shown, using properties (P) and letting n — >■ oo through 
values in A/", that (3.5) fails. 

4. Numerical properties. This section summarizes the results of a Monte 
Carlo investigation of the finite-sample performance of the estimators h and h 
discussed in Section 2. Samples of sizes n = 100 and 500 were generated 
from the model (2.1), with I = [0, 1], a = and the errors Ei distributed as 
normal N(0, cj^), where as = 0.5 or 1. We took b = J2i<j<5obj(pj and X = 
J2i<j<5o7j^j4'jj where (a) bi = 0.3 and bj = 4(— l)-'"^^j~^ for j > 1, (b) the 
7j's were deterministic coefficients, (c) 0i = 1 and cos(j7rt) for 

j >1 and (d) the Zj^s were uniformly distributed on [—3"*^/^, 3^/^]. In partic- 
ular, each Zj had zero mean and unit variance. 

Two sets of the 7j's were used. In the first, ■yj = {—ly^^j""^'^, with a = 
1.1, 1.5, 2 or 4. For these coefficients, the eigenvalues of the operator K were 
Kj = and were distinct. In the remainder of this section, we label these 
eigenvalues "well-spaced." In the second set, 71 = 1, 7^ = 0.2(— 1)-'"'"^(1 — 
O.OOOli) if 2 < j < 4, and j^j+k = 0.2(-l)5j+'=+i{(5j)-"/2 - O.OOOIA:} for 
j > 1 and < A; < 4. This set of 7j's generated blocks of Kj's that were 
nearly equal when j was not too large and we refer to it as the "closely 
spaced" case. The theoretical arguments presented in Section 3 suggest that 
the performance of b can be poor in this setting. 

All our results represent averages over 1000 Monte Carlo replications for 
each parameter setting. The quantities denoted by Bias^, Var and MISE 
in Tables 1 and 2 are Monte Carlo approximations to integrated squared 
bias, integrated variance and mean integrated squared error, respectively, 
computed on a grid of 50 equally spaced points on I. The values of m 
and p, for given n, a^, a and a given set of 7j's, were chosen to minimize 
MISE. 

Table 1 shows that in the case of well-spaced eigenvalues, the MISE of b 
is smaller than that of b for almost all values of the other design parameters. 
However, it follows from Table 2 that in the closely spaced case, the MISE 
of b is nearly always smaller than that of b. Thus, in terms of MISE, neither 
estimator dominates the other. 

Both tables reveal that there is a general tendency for MISE to decrease 
as a increases. This does not contradict (3.5) or (3.10) since those results 
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describe the behavior of MISE as a function of n for fixed a and f3, not the 
behavior of MISE as a function of a or /? for fixed n. 

5. Derivation of Theorem 1. 

5.1. Proof of (3.5). We begin by defining notation to be used in the 
proof. Given a sequence c„ of positive constants, we shall use Op(c„) and 
Op{cn) to denote random variables Rn and r„, respectively, which satisfy 

lim limsup sup PpHRnl > Dcn) = 0, 

lim sup Ppilrnl > Dcn) = for each D > 0. 

Similarly, a deterministic quantity An = An{F), written as An = 0{cn), will 
be understood to satisfy 

supc""^ sup |^„(F)| < oo. 
n>i Fe:F 

Next, we state subsidiary results concerning distances between the spectra 
of two operators. Let L denote a general positive semidefinite linear operator 
as well as the kernel of that operator. Let the spectral decomposition of L 
be 

oo 

(5.1) L{u,v) = ^Xji;j{u)'il)j{v). 



Table 1 

Results of Monte Carlo experiments for well-spaced eigenvalues 





n 


a 


P 


a™ 


Bias^(S) 


Bias^(6) 


Var(S) 


Var(6) 


MISE(S) 


MISE(b) 


0.5 


100 


1.1 


2 


0.4 


0.158 


1.150 


0.843 


1.340 


1.001 


2.490 






1.5 


2 


0.38 


0.148 


1.289 


0.718 


0.759 


0.866 


2.048 






2.0 


2 


0.28 


0.140 


1.202 


0.676 


0.622 


0.816 


1.824 






4.0 


2 


0.10 


0.134 


1.344 


2.225 


0.611 


2.359 


1.955 




500 


1.1 


3 


0.28 


0.016 


0.717 


0.236 


0.480 


0.251 


1.197 






1.5 


3 


0.22 


0.015 


0.663 


0.254 


0.364 


0.269 


1.027 






2.0 


2 


0.12 


0.139 


0.416 


0.146 


0.441 


0.285 


0.857 






4.0 


2 


0.032 


0.139 


0.460 


0.409 


0.493 


0.548 


0.953 


1.0 


100 


1.1 


2 


1.0 


0.161 


2.709 


2.034 


1.203 


2.195 


3.913 






1.5 


2 


0.75 


0.149 


2.401 


2.221 


1.019 


2.370 


3.420 






2.0 


2 


0.50 


0.139 


2.047 


2.395 


1.034 


2.534 


3.081 






4.0 


1 


0.25 


3.257 


2.302 


0.501 


0.788 


3.758 


3.090 




500 


1.1 


2 


0.50 


0.142 


1.438 


0.408 


0.758 


0.549 


2.197 






1.5 


2 


0.35 


0.138 


1.164 


0.425 


0.702 


0.563 


1.866 






2.0 


2 


0.10 


0.139 


0.314 


0.514 


2.279 


0.654 


2.593 






4.0 


2 


0.10 


0.139 


1.386 


1.647 


0.472 


1.786 


1.858 
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Table 2 

Results of Monte Carlo experiments for closely spaced eigenvalues 





n 


a 


m 


P 


Bias^(S) 


Bias^(S) 


Var(S) 


Var(S) 


MISE(S) 


MISE(b) 


0.5 


100 


1.1 


1 


0.22 


3.526 


2.502 


0.141 


0.585 


3.398 


3.087 






1.5 


1 


0.22 


3.257 


2.487 


0.131 


0.455 


3.389 


2.942 






2.0 


1 


0.20 


3.259 


2.403 


0.126 


0.454 


3.385 


2.857 






4.0 


1 


0.20 


3.260 


2.402 


0.130 


0.433 


3.390 


2.835 




500 


1.1 


5 


0.08 


0.002 


1.463 


2.510 


0.574 


2.512 


2.037 






1.5 


5 


0.06 


0.002 


1.212 


2.604 


0.623 


2.606 


1.835 






2.0 


5 


0.04 


0.006 


0.846 


2.528 


0.783 


2.535 


1.629 






4.0 


5 


0.04 


0.006 


0.780 


2.500 


0.640 


2.506 


1.420 


1.0 


100 


1.1 


1 


0.42 


3.260 


3.127 


0.533 


0.856 


3.793 


3.983 






1.5 


1 


0.42 


3.271 


3.031 


0.512 


0.706 


3.783 


3.736 






2.0 


1 


0.32 


3.260 


2.822 


0.540 


0.937 


3.799 


3.759 






4.0 


1 


0.36 


3.262 


2.954 


0.496 


0.760 


3.758 


3.713 




500 


1.1 


1 


0.20 


3.258 


2.379 


0.109 


0.532 


3.367 


2.911 






1.5 


1 


0.14 


3.262 


2.078 


0.109 


0.729 


3.372 


2.807 






2.0 


1 


0.12 


3.262 


1.922 


0.103 


0.762 


3.366 


2.684 






4.0 


1 


0.12 


3.256 


1.818 


0.107 


0.695 


3.363 


2.514 



We assume that the terms are ordered in such a way that Ai > A2 > • • • > 0. 
Given univariate functions q and a symmetric bivariate function M, let 
III Mill = (//^2 M2)V2. Write j pq and j Mpq for 



p{u)q{u) du and 



J2 



M{u, v)p{u)p{v) du dv, 



respectively. Further, denote by / Mp the function of which the value at u 
is Jj- M{u,v)p{v) dv and define 5j =mini<fc<j(Kfc — Hk+i)- 

The following pair of results may be derived from theory developed by 
Bhatia, Davis and Mcintosh [1]: 

(5.2) sup\Kj - \j\<\\\K - L\\\, sup 6j\\(pj -Tpj\\<8^/'^\\\K - L\\\. 

In framing the second bound here, we use the convention that / V'j'/'j ^ 0. 
This determines the sign of tpj in those cases where choice of sign has an 
impact on the validity of (5.2). 

The following lemma will be proven in Section 5.2: 

Lemma 5.1. If we are able to write ipj — 4>j = Xj + for functions Xj 
and Aj, then 



(5.3) 



A,-| + 



(K-L) {(pj + Xj)<Pj 
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Furthermore, i/inf^: \ Xj — Kk\ > 0, then 

(5.4) -(j)j= ^ (Aj - Vfe y"(^ - + J it 



k : k^j 



Put A = |||i^ — -f^lll and define the event by 

£m — £m{jl) — \^2^m ^ A}. 

That is, Em denotes the set of all realizations such that for sample size n, 
> A. Below, when we say that a bound is valid when £m holds, this 
should be interpreted as stating that the bound is valid for all realizations 
for which > A. It is not a statement that relates to a conditioning 
argument in the sense that conditioning is usually interpreted in probability 
theory. 

Write hj = bj + kj^ (Sji + Sj2 + Sj^), where kjbj = J g4>j , Sji = J{g- g)4>j , 
Sj2 = / gi^Pj ~ 't'j) ^-iid 5j3 = J{g — g)[4>j ~ 4>j)- I^i this notation, 

m m 

j=i j=i 

m 

(5.5) <12Y,Kf{S], + S% + S%) 

■m m 

< 12 E ^fiS], + 5^2) + 12\\g - gf ^ ^fW^J " II'' 
i=i i=i 

where the first inequality holds universally; the second inequality, obtained 
using the first part of (5.2), is valid provided the event holds; and the 
third inequality employs the bound \Sj3 \ < \\g — g\\ — ||- 
Note that provided 8m holds, we have 

t(h - = t (%^) ' (/ ' < 4 E (^) ' (/ 94>- ' 

(5.6) ^ 

i=i 

Define Aj = || ^{K — K)(j)j \\. Using (5.3) with Xj = and then applying both 
parts of (5.2), we obtain 



(5.7) 



< W^j - cl)j\\{\kj - Kj\ + Aj) 
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Combining (5.6) and (5.7) and defining Ajj = | J{K — K)(j)j(l)j\, we deduce 
that if £m holds, then 

mm m 

(5.8) ^(6, - <8Y^^'bj^ + ^(5,.,)-26|(A2 + A|). 

We shall prove in Section 5.3 that under the conditions of the theorem, 

(5.9) E{A') + E{A]) = 0{n~'), E{A%) = 0{n~^K]), 

uniformly in j. In particular, (5.9) entails A = Op(n~^/^). Now, (3.2) and (3.4) 
imply that n^/^K^ ^ oo as n ^ oo, so the first part of (5.9) implies that 
P{£m) 1- Therefore, since the result (3.5) that we wish to prove relates 
only to probabilities of differences (not to moments of differences), it suf- 
fices to work with bounds that are established under the assumption that 
Era holds, since the contrary case contributes only o(l) to the probability on 
the left-hand side of (3.5). 

In our arguments below, we shall use the property A = Op(n~^/^) with- 
out further reference. Now, the conditions in Theorem 1 imply that < 
Cij°"^^, whence it follows that 



m m 

n 



(5.10) 

i=i i=i 



where Ci, . . . , C5 are positive constants and s(n) equals n^^""^^"*"^^/^"^^^-* if 
the exponent is strictly positive, equals 1 + logn if the exponent vanishes 
and equals 1 otherwise. Combining (5.8)-(5.10), we deduce that 

m 

(5.11) Y.(h - hf = Op{n-' + n-h{n)} = Op(n-(2/3-i)/(-+2/5)). 



Observe, too, that 

2 



{m ^ 
|]6,-0,(n)-6(n)| du<2j^ 



(5.12) +2^6" 



■j=i 

00 

2 
i 

j=m+l 



du 



<2mY,h]U,-^,\\^ + 2 bl 

j=l j=m+l 
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Combining (5.5), (5.11) and (5.12), we find that 



n I I It I I It 

ib-b)'<3Y^ib,-b,)' + 3j2ih-b 

+3E/ E^.- 



b 

i=i" \i=i 



(5.13) < 86^^72(521 + 522) 



+ 36E("^fci + ll5-5llV')ll'^i-'^ill' 

i=i 

00 

+ 6 ^ 62^0p(n-(2/3-i)/("+2/5)). 

jr=m+l 

Simple moment calculations show that E\g — = 0{n~^) and, clearly, 
Ej>m+i b] = 0(7i-(2/5"i)/{"+2/3)). It will be proved in Section 5.3 that 

(5.14) E{Sl) = Oin~'Kj), 

whence it follows that E^xm^J^^'l^ = Op(?i"(2/3-i)/{Q+2/3))^ Combining these 
results and (5.13), we see that (3.5) will follow if we prove that 



j=i ^ ^ i=i 

= Op(n-(2/3-i)/("+2/3)). 



2/^ + n-y")||<^,-<A,f 



Derivation of this property requires bounds on (pj — (j)j , which we now discuss. 
Take L = K, Xj = kj and V'j = 4'j in Lemma 5.1. Formula (5.4) yields 

\\cl)j — (/>j||2 = uj -irVj, where 



^1= E {kj-Kk)-^{j{K-K)^j^k^ 



and = {/(i?!>j — (?l)j)0j}2. Now, equals the length of the projection of 
— into the plane perpendicular to ; hence, it also equals the projection 
of (j)j into that plane. Also, / 4>j4>j equals the length of the projection of (pj 
onto (pj. Therefore, by Pythagoras' Theorem, (/ (pjcpj)'^ '^'j ~ W^jW"^ ~ 
whence it follows that / 4>j(t>j = (1 — uj)^^'^. Hence, 



v] = (1 - / M?) ' = {1 - (1 - ^J)'^'}' = 2{1 - (1 - -2)^2} _ ul 
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which imphes that 

(5.16) - </.,-f = 2{1 - (1 - u2)i/2} < 2u]. 
Let C > and define 

Tm = TrAn) = {[k^ - < 2{k, - < c n^i^+D / i^+W) ^^ ^ 

that is, the set of reahsations such that, for sample size n, {kj — K,k)~'^ ^ 
2{Kj - < Cn2("+i)/("+2/5). Observe that 

(5.17) I j {K - K)^j(t>k^'' < {K - K)<Pj,i>k^'^ + 2w%, 

where = {j{K — K){4>j — (pj)(f>k}'^. Note, too, that uniformly in 1 < j < 
m, 

max(Kj - - Kj) > > c72n-("+i)/(°+2^), 

where Ci, C2 denote positive constants, and that since /3 > + 1, it follows 
that n"^/"^ = o(n~("+-^)/("+2^)). These properties, and the fact that \kj — 
Kj \ < A = Op(n~^/2), imply that if the constant C in the definition of J-'m 
is chosen to be sufficiently large, then P{J^m) ^ 1 as n — 00. Therefore, as 
in the case of £n, since (3.5) relates only to probabilities of differences, it 
suffices to work with bounds that are established under the assumption that 
J-m holds. In this case. 



00 

(5.18) E (K, - .,r'w% < Cn^("+i)/("+2/^) u,%. 

k:k=ij k=l 

Using Parseval's identity and the Cauchy-Schwarz inequality, we may 
prove that 

00 r [ r 1 2 

(5.19) Y.^h= (K-K){u,v){^,-<Pj)iv)dv du<A^\\4>j-cPjf. 
k=i JlUl i 

Combining (5.17)-(5.19), we deduce that provided Tm holds, we have 

2 



<2 Y: ik,-^kr'yiK-K)<p,<Pk] +2Cn2(-+i)/(°+2/3)A2||<^^._<^^.||2. 



k : k=/=j 

Substituting into (5.16), we find that 



(1 _ 4Cn2("+i)/("+2/3)A2)||0, - </,,.p 

k : kj^j 



<4 Y {kj-Kkr^l^J(K-K)cl,j(l)kj\ 
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Recall that A = Op{n ^/^) and observe that since /3 > + 1, we have 
^2{q+i)/{q+2/3) . j^-i _^ Q Therefore, noting that P{J='m) 1, we deduce that 
(5.20) implies 

2 



(5.21) 



k : k=/=j 



<8{l + 0p(l)} J2 i'^j-f^kr^yiK-K)(Pj(l)kY 



k : k=/=j 



where the Op(l) terms are of that order uniformly in 1 < j < m. We shall 
show in Section 5.3 that 

(5.22) n i^j-f^k)-^Ei f(K-K)4>j^ky = 0{f), 

k:kjtj ^ 

uniformly in 1 < j <m. Results (5.21) and (5.22) together imply that 
(5.23) 



^(mj-2^ + n-^f'')\\$j - = Op(mn-i + m^^+^n'^) 



:Op(n-(2/3-l)/(«+2/3)). 



Next, observe that 



y"5(0i-0i)= 9k{Kj-Kk) ^ J {K - K)(j)j(l)k 



(5.24) 



where 



k : k^j 

■■Tji + Tj2 + Tj3 + Tji, 



k : kj^j 

Tj2 = Y 9k{{f^j - Kk)"^ - (kj - Kky^} ji^- K)(pj(pk, 
k : kj^j 

Y dkikj - Kky^ J {K - K){4>j - (pj)(j)k 



k : kj^j 



and Tji = gj J{(l)j — Let Ci,C2,... denote positive constants. Since 

bfcl < Cifc-("+/5), then if Tm holds, we have 
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J2\ liK-K)<p,4>k 

.k=l 



k : k^j 



Now, 



k=2j k=2j 
i/2 j/2 j/2 

k=l k=l k=l 

f 1, ifa + i</3, 

<C9<l + logi, ifa + | = A 

k=j/2 k=j/2 



Therefore, 



(5.25) Y k~'^-+P\K, - K,)-' < Cnil+f--'^^' + log j), 

k : k^j 

whence, using (5.2) and (5.9), we have 

m m 

Ej-'°^^ < C^i2E(% - ^,)^A,^(,2"logn + /"-2/^+^) 

m 

E^(A,')(j'"logn + /-2/3+^) 

3=1 J 
m ^ 

3=1 J 

: Op{n-2(m2"+l logn + m4"-2/3+5)} = Op(^-(2/3-l)/(a+2/3))_ 











= o,< 




(5.26) 






= Op< 


1- 
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If J-m holds, then 



(5.27) 



k : k^j 
k : k^j 



X j \(l)k{u)\ 

<Ci^K\\L-< 



{K{u,v) - K{u,v)Y dv 

k:k^j 



1/2 



du 



< CiGA\\4>j - <f)j\\, 

where the last inequality follows using the argument leading to (5.25). From (5.27), 
using (5.21) and (5.22), it may be shown that 



(5.^ 



2a+2 



Op(n-(2/3-l)/("+2/3)). 



More simply. 



(5.29) Yj'''Tf,<CuYr'^U^ -'i^^f = Op{n-'). 

Combining (5.24), (5.26), (5.28) and (5.29), we deduce that 

^(2/3-1)/(q+2/3)^ 

We shall prove in Section 5.3 that 



(5.30) Yj^'l 9(^^-W <4E^''"^i + ' 



(5.31) 



The desired result (5.15) follows from (5.23), (5.30) and (5.31). This com- 
pletes the proof of (3.5). 

5.2. Proof of Lemma 5.1. To derive (5.3), observe that on subtracting 
the expansions of K and L in (2.1) and (5.1), respectively, we obtain an 
expansion oi K — L. Multiplying both sides of this by '4jj{u)4)j{v) and inte- 
grating over u and we deduce that 



(5.32) (kj - \j) j - I {K- L)ijj(j)j = 0. 

Since ipj = <j)j + Xj + ) we have 



(5.33) 



A 



< IIA 
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{K-L){^j-^,-X,) 



(5.34) 



j{K- L)Aj^j 



< 



{K{u,v) — L{u,v)}<j)j{u) du 



dv. 



Result (5.3) follows from (5.32)-(5.34). 

The expansions of K and L in (2.1) and (5.1) may be used to prove that 



- ^j) = J - <pj) + j{L- K)'^j - {Xj - Kj)(pj. 

Multiplying both sides by (f)^ and integrating, we deduce that 

Xj / [ipj - = Kk j (ipj - (t)j)(t)k + (L- K)'iljj(j)k - (Xj - Kj)5jk, 



where 6jk denotes the Kronecker delta. Equivalently, provided Xj 7^ Kfc, we 
have 



{tpj - (j)j)(j)k = {Xj - Kk) ^ J i^- K)'4'j4'k - 6jk. 

Result (5.4) follows from this formula and the fact that 

00 „ 

k=l •' 

5.3. Proofs of (5.9), (5.14), (5.22) and (5.31). Direct calculation shows 
that E(K - Kf = 0{n-^), uniformly on J x Z. It follows that ^(A^) = 
0{n~^). Note, too, that by Parseval's identity, = Y^j and so sup^ E(K^) 
0{n-^). 

This gives the first part of (5.9). To derive the second part, assume without 
loss of generality that E[X) = and observe that 



(5.35) 



where ^ij = J Xicpj, ^j = n~^J2i^ij and S,j denotes a generic ^ij. Therefore, 
using the fact that E{^j) < Ci{E^j)'^, where Ci > does not depend on j, 
we deduce that 

where C2 does not depend on j. This implies the second part of (5.9). 
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To prove (5.14), observe that 

J{g- g)cpj = n-i fj^ii^ J - J bX - E (^C^J J hX- ^ 

n 
i=l 

where e = n~^ Yl,i ^i- It may thus be proved that 
71^11(5 -g)<A,- 1' < Cgjvarl^ei/ bX^ + var(ej)| < C^iEi'^Y'^ < C^k,, 

which imphes (5.14). 

To obtain (5.22), note that by (5.35) and the fact that E{^j) < Ci(e^|)2, 
we have 

ue!^ J {K - K)4>jcPk}^ < CeE{^]ek) < CriE^j ■ Eitf'^ < CsKjKk, 

uniformly in j and k. Result (5.22) follows directly on substitution and 
employing the argument leading to (5.25). 
Again using (5.35), we have 

n 

i=l k : kj^j 

from which it may be proved that since -E'd'^fci • . •'^fc4l) < Ylei-^^tiy^^ ^ 
nE{Tl) < CgEUj 9k{Kj - A^fe)-'^^ j' 

Ikiikij^j ki-.ki^i 



< 



n \9kt{l^3 - l^kd ^|| 



1/2 



<Cu^A E 

\:kytj 



9kKk 



4 



uniformly in j. Therefore, J2j<mj'^"E{Tj^) < CiQn~^m°'~^^ , which implies (5.31). 

5.4. Proof of (3.6). Let J = [0, 1], (pi = I and (/'j+i(t) = 2-^2 cos(j7rt) 
for j > 1. Put bj = Ojj~^ for Ln+i < i < 2L„ and bj = otherwise, where 
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Ln denotes the integer part of n^/*^^^"^^) and each 9j is either or 1. Let 
Kj = and write Zi,Z2,--- for independent random variables, all with 
the uniform distribution on [-3^/^3^/2]_ ^Q^g ^j^^t E{Zj) = 0, EiZf) = 1 
and that the Zj's are observable if X is observable since Zj = j"/^ JjX(f)j. 
Set X = J2j r''^^ Zj(Pj and 

Y=fbX + e= ejj-^'^+^^^/^Z.+E, 

^ j=L„+l 

where the error e is taken to be Gaussian with zero mean. Then we may 
write b = J2L„+i<j<2L„ ™d ^ an estimator of b, it follows that 



(5.36) %=j 



I 



is an estimator of 9j. An argument based on the Neyman-Pearson lemma 
shows that 

lim inf inf sup* ElOi - O.f > 0, 

where sup* denotes the supremum over all 2^" different distributions of 
{X,Y) obtained by taking different choices of Ol^+i, ■ ■ ■ )^2L„; and inf^^. rep- 
resents the infimum over all measurable functions 6j of the data. Therefore, 
if an estimator b is given and 6l„+i, ■ ■ ■ , ^2L„ are the respective estimators of 
9l^+i, . . . ,62Ln obtained by substituting b for b in (5.36), then for constants 
Di,D2 > which do not depend on the choice of the measurable function b, 

sup* / Ep{b - bf = sup* r^^EpiOj - Ojf 

>Di J2 j-2/?>D2n-(2/5-i)/("+2/3). 
j=L„+l 



This proves (3.6). 
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